{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Chunk Size Guide"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Overview\n",
    "When Featuretools calculates a feature matrix, it first groups the rows to be calculated into chunks. Each chunk is a collection of rows that will be computed at the same time. The results of calculating each chunk are combined into the single feature matrix that is returned to the user. The size of these chunks is determined by the `chunk_size` parameter, found in `dfs` and `calculate_feature_matrix`. If you wish to optimize chunk size, picking the right value will depend on the memory you have available and how often you’d like to get progress updates. This guide will go over a few example scenarios for choosing a good `chunk_size` and the factors to consider when doing so.\n",
    "\n",
    "![](../images/chunk_size_graph.png \"Chunk size versus calculation time\")\n",
    "\n",
    "### Peak memory usage\n",
    "If rows have the same cutoff time they are placed in the same chunk until the chunk is full, so they can be calculated simultaneously. By increasing the size of a chunk, it is more likely that there is room for all rows for a given cutoff time to be grouped together. This allows us to minimize the overhead of finding allowable data. The downside is that computation now requires more memory per chunk. If the machine in use can’t handle the larger peak memory usage this can start to slow down the process more than the time saved by removing the overhead.\n",
    "\n",
    "### Frequency of progress updates and overall runtime\n",
    "After the completion of a chunk, there is a progress update along with an estimation of remaining compute time provided to the user. Smaller chunks mean more fine-grained updates to the user. Even more, the average runtime over many smaller chunks is a better estimate of the overall remaining runtime.\n",
    "\n",
    "However, if we make the chunk size too small, we may split up rows that share the same cutoff time into separate chunks. This means Featuretools will have to slice the data for that cutoff time multiple times, resulting in repeated work. Additionally, if chunks get really small (e.g. one row per chunk), then the overhead from other parts of the calculation process will start to contribute more significantly to overall runtime."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The `chunk_size` parameter\n",
    "The size of each chunk is determined by the `chunk_size` parameter  in `dfs` or `calculate_feature_matrix`.  Valid inputs are:\n",
    "\n",
    "* `None` or unspecified\n",
    "\n",
    "```\n",
    "# not specified, so default option\n",
    "# uses 10% of rows per chunk\n",
    "# if total rows is less than 100, chunk size is 10\n",
    "fm = ft.calculate_feature_matrix(features)\n",
    "\n",
    "# None is the default option\n",
    "# same as above, 10% of rows per chunk or 10 rows per chunk, whichever is larger\n",
    "fm = ft.calculate_feature_matrix(features, chunk_size=None)\n",
    "```\n",
    "\n",
    "* A positive integer\n",
    "\n",
    "```\n",
    "# 25 rows per chunk\n",
    "fm = ft.calculate_feature_matrix(features, chunk_size=25)\n",
    "```\n",
    "\n",
    "* A float between 0 and 1 (percentage)\n",
    "\n",
    "```\n",
    "# 5% of all rows per chunk\n",
    "fm = ft.calculate_feature_matrix(features, chunk_size=.05)\n",
    "```\n",
    "\n",
    "* The string \"cutoff time\" \n",
    "\n",
    "```\n",
    "# use one chunk per unique cutoff time\n",
    "fm = ft.calculate_feature_matrix(features, chunk_size=\"cutoff time\")\n",
    "```\n",
    "\n",
    "Unlike the other options, \"cutoff time\" does not genereate a specific number of rows per chunk.  Instead of trying to create uniformly sized chunks, Featuretools will calculate every row with the same time together. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Instacart Dataset\n",
    "\n",
    "This example makes use the Instacart dataset seen in the [Predict Next Purchase demo](https://github.com/featuretools/predict_next_purchase). Sets of labels and cutoff times are made at 3 month increments, but due to the nature of the data the first set of cutoff times accounts for nearly half of all rows in the cutoff time dataframe. Both the single chunk and cutoff time chunking scenarios will load all of that data in a single slice.  As we can see by comparing the single chunk or grouping by cutoff time cases with the smaller chunk size cases, keeping the entire cutoff time group together slows down the computation on this computer (4 cores @ 2.50 GHz, 8 GB RAM). Smaller chunk sizes than the default 10% lead to faster run times at first, continuing to reduce the size of each chunk starts to slow down the computation again. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "import datetime\n",
    "\n",
    "import featuretools as ft\n",
    "import matplotlib.ticker as ticker\n",
    "import pandas as pd\n",
    "import seaborn\n",
    "from featuretools.computational_backends import bin_cutoff_times\n",
    "\n",
    "\n",
    "# assign locator and formatter for the xaxis ticks.\n",
    "def timeTicks(x, pos):                                                                                                                                                                                                                                                         \n",
    "    d = datetime.timedelta(seconds=x / 1000000000. )                                                                                                                                                                                                                                          \n",
    "    return str(d)                                                                                                                                                                                                                                                              \n",
    "formatter = ticker.FuncFormatter(timeTicks)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "            user_id  label\n",
      "time                      \n",
      "2015-03-15     7790   7790\n",
      "2015-06-15     4458   4458\n",
      "2015-09-15     2515   2515\n",
      "2015-12-15     1091   1091\n"
     ]
    }
   ],
   "source": [
    "import utils\n",
    "\n",
    "# import entityset\n",
    "instacart_es = utils.load_entityset(\"partitioned_data/part_1/\")\n",
    "\n",
    "# make labels and cutoff times\n",
    "label_times = pd.concat([utils.make_labels(es=instacart_es,\n",
    "                                           product_name = \"Banana\",\n",
    "                                           cutoff_time = pd.Timestamp('March 15, 2015'),\n",
    "                                           prediction_window = ft.Timedelta(\"4 weeks\"),\n",
    "                                           training_window = ft.Timedelta(\"60 days\")),\n",
    "                         utils.make_labels(es=instacart_es,\n",
    "                                           product_name = \"Banana\",\n",
    "                                           cutoff_time = pd.Timestamp('June 15, 2015'),\n",
    "                                           prediction_window = ft.Timedelta(\"4 weeks\"),\n",
    "                                           training_window = ft.Timedelta(\"60 days\")),\n",
    "                         utils.make_labels(es=instacart_es,\n",
    "                                           product_name = \"Banana\",\n",
    "                                           cutoff_time = pd.Timestamp('September 15, 2015'),\n",
    "                                           prediction_window = ft.Timedelta(\"4 weeks\"),\n",
    "                                           training_window = ft.Timedelta(\"60 days\")),\n",
    "                         utils.make_labels(es=instacart_es,\n",
    "                                           product_name = \"Banana\",\n",
    "                                           cutoff_time = pd.Timestamp('December 15, 2015'),\n",
    "                                           prediction_window = ft.Timedelta(\"4 weeks\"),\n",
    "                                           training_window = ft.Timedelta(\"60 days\")),],\n",
    "                       ignore_index=True)\n",
    "\n",
    "print(label_times.groupby('time').count())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1 chunk"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Built 76 features\n",
      "Elapsed: 04:20 | Remaining: 00:00 | Progress: 100%|██████████| Calculated: 1/1 chunks\n"
     ]
    }
   ],
   "source": [
    "start = time.time()\n",
    "feature_matrix, features = ft.dfs(target_entity=\"users\", \n",
    "                                  cutoff_time=label_times,\n",
    "                                  training_window=ft.Timedelta(\"60 days\"),\n",
    "                                  entityset=instacart_es,\n",
    "                                  chunk_size=15854,\n",
    "                                  verbose=True)\n",
    "stop = time.time()\n",
    "one_chunk_duration_4 = datetime.timedelta(seconds=stop-start)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 10% per chunk"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Built 76 features\n",
      "Elapsed: 03:18 | Remaining: 00:00 | Progress: 100%|██████████| Calculated: 11/11 chunks\n"
     ]
    }
   ],
   "source": [
    "start = time.time()\n",
    "feature_matrix, features = ft.dfs(target_entity=\"users\", \n",
    "                                  cutoff_time=label_times,\n",
    "                                  training_window=ft.Timedelta(\"60 days\"),\n",
    "                                  entityset=instacart_es,\n",
    "                                  verbose=True)\n",
    "stop = time.time()\n",
    "default_duration_4 = datetime.timedelta(seconds=stop-start)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 5% per chunk"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Built 76 features\n",
      "Elapsed: 03:10 | Remaining: 00:00 | Progress: 100%|██████████| Calculated: 21/21 chunks\n"
     ]
    }
   ],
   "source": [
    "start = time.time()\n",
    "feature_matrix, features = ft.dfs(target_entity=\"users\", \n",
    "                                  cutoff_time=label_times,\n",
    "                                  training_window=ft.Timedelta(\"60 days\"),\n",
    "                                  entityset=instacart_es,\n",
    "                                  chunk_size=.05,\n",
    "                                  verbose=True)\n",
    "stop = time.time()\n",
    "five_percent_per_chunk_duration = datetime.timedelta(seconds=stop-start)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.5% per chunk"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Built 76 features\n",
      "Elapsed: 03:09 | Remaining: 00:00 | Progress: 100%|██████████| Calculated: 41/41 chunks\n"
     ]
    }
   ],
   "source": [
    "start = time.time()\n",
    "feature_matrix, features = ft.dfs(target_entity=\"users\", \n",
    "                                  cutoff_time=label_times,\n",
    "                                  training_window=ft.Timedelta(\"60 days\"),\n",
    "                                  entityset=instacart_es,\n",
    "                                  chunk_size=.025,\n",
    "                                  verbose=True)\n",
    "stop = time.time()\n",
    "two_point_five_per_chunk_duration = datetime.timedelta(seconds=stop-start)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1% per chunk"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Built 76 features\n",
      "Elapsed: 03:24 | Remaining: 00:00 | Progress: 100%|██████████| Calculated: 101/101 chunks\n"
     ]
    }
   ],
   "source": [
    "start = time.time()\n",
    "feature_matrix, features = ft.dfs(target_entity=\"users\", \n",
    "                                  cutoff_time=label_times,\n",
    "                                  training_window=ft.Timedelta(\"60 days\"),\n",
    "                                  entityset=instacart_es,\n",
    "                                  chunk_size=.01,\n",
    "                                  verbose=True)\n",
    "stop = time.time()\n",
    "one_percent_per_chunk_duration = datetime.timedelta(seconds=stop-start)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### \"cutoff time\" option"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Built 76 features\n",
      "Elapsed: 04:20 | Remaining: 00:00 | Progress: 100%|██████████| Calculated: 4/4 chunks\n"
     ]
    }
   ],
   "source": [
    "start = time.time()\n",
    "feature_matrix, features = ft.dfs(target_entity=\"users\", \n",
    "                                  cutoff_time=label_times,\n",
    "                                  training_window=ft.Timedelta(\"60 days\"),\n",
    "                                  entityset=instacart_es,\n",
    "                                  chunk_size=\"cutoff time\",\n",
    "                                  verbose=True)\n",
    "stop = time.time()\n",
    "cutoff_duration_4 = datetime.timedelta(seconds=stop-start)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAawAAAEWCAYAAAA6maO/AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvFvnyVgAAIABJREFUeJzt3Xu8XPO9//HXm0REJETCqbgltEpRUXH5oaWKom516a9o9fJrUY3e0ON2UKXVg6pWS0MbtBqXFOdHlTinFa22NNFIhLhHhBDikgtCks/5Y31HViaz956dzMzON/v9fDzmsdf6znet9fnOzJ7PfL/rO2sUEZiZma3oVunqAMzMzOrhhGVmZllwwjIzsyw4YZmZWRacsMzMLAtOWGZmlgUnLLNuRNLGkuZKWrWrY2k2SV+T9FJq7wBJu0p6Iq0f0sl9nS7pqmbFavVxwrLlImm4pHGS5ku6uuq+wZIivUFUbv9Run8dSTdImiXpFUnXSepXun+qpLdK245pI4b/Scfp0cb91XFMlXRqgx6ChkmPx8uS/tpOnXMk/bZGeUh6f0fHiIhpEbFmRCxc3ng7S1I/ST+RNC09D0+l9YF1bLuHpOmdOFZP4MfAPqm9s4BzgcvS+q1V9cuv0UVVr7ujI+IHEfGVzrbZGqvmP7hZJ7wAnAd8EujdRp21I2JBjfLzgP7AEEDA74FzgO+U6hwYEf/d1sElHQ30rDPWtSNigaRhwFhJ4yPi7jq3bYUfAY+yEn6QlLQa8D/A68C+wBRgIHAcsCNwR4MP+W/A6sDkUtkmVevviYg1S7FOBb7S3uvOusZK949hrRURN6dPq7OWYfMhwK0RMTsi3gBuAbaqd2NJawFnA9/tzEEjYhzFG9fQ0r62lHSPpNclTZZ0UCofkspWSetXSppZ2u43kr6Vlr8o6WlJcyQ9k5JpvW3ZBdgaGNmZtrSxr3skfV/SfSmWMZVeTKm32aPUvrGp3t2SLqv04Gr1alLvdK+0vIqkU1NPaZakGyWt00ZYxwAbA5+OiEciYlFEzIyI70fEHWl/S/QSJV0t6TxJfYA/AoNKvZ5BknqlHtoL6faTVLY58FjazeuS/iTpKWBT4La0fa9OPqbnlB6XymP4JUnPSXpN0vGSdpA0Mb1eLqva/suSHk1175K0SSqXpEskzZQ0W9IkSVt3JrbuxAnLWuFZSdMljawa/vk5cICk/pL6A4dRvDGVXadimGyMpG2r7vsBcDnwYmeCkbQzRXJ4Mq33BG4DxgDrASem434wIp4BZgPbpc0/BsyVtGVa352it9YH+CmwX0T0BXYBJtQZz6rAZcBwoFHXSjsK+FJqz2rAyW3U+x0wnqK3833gC504xonAIRSPwSDgNYrntJa9gDsjYm4n9g9ARMwD9gNeSMN5a0bEC8AZwM4UHzy2peipnRkRj7P4g8/aEbFnRGwGTKPosa8ZEfM7G0cNOwEfAP4v8JMUz17p2J+RtDuApIOB04FDgXWBvwCj0j72oXhNbQ6sBXyGZfvw1y04YVkzvQLsQDEUsz3QF7iudP+DFG+ms9JtIfCL0v1HA4PT9n8G7pK0NkAa1tsV+Fln4pH0FvD3dJzKeYydgTWBCyLinYj4E3A7cGS6fyywu6T3pfXRaX0I0A94KJUvAraW1DsiZkREzeGnGr4B3B8R4zvRlo6MjIjHI+It4EZKvckKSRtTPD//ERHzI+JeisRdr+OBMyJiekoA5wCHq/a5xAHAjM42ogNHA+emntrLwPeAzzf4GO35fkS8HRFjgHnAqBTL8xRJqfIh53jghxHxaBoa/wEwNPWy3qX4v9gCUKrT6MdppeGEZU0TEXMjYlxELIiIlyh6EPtI6puq3Ag8TvEP2w94Cvhtafv7IuKtiHgzIn5Icf7jo2l47hfAN9s4N9aWgRSJ6SRgDxaf+xoEPBcRi0p1nwU2SMtjU/2PAfcC91D0KnYH/pKGt+ZRfNI+Hpgh6Q+StugoIEmDKBLWGXW2YQFV5+xSDxGKN7+Kcq/zTYp2VxsEvJZir3i2zjig+CBxSxoCe53i/NtCivNH1WYB63di3/UYxJLxPpvKWuWl0vJbNdYrj/kmwKWlx+lVinO2G6QPR5dR9ExnShqh0sQjW5ITlrVSZbir8robCvwyIualoaIrgP072F4UyW0YcIOkF4F/pvunS/pouwFELIyIHwNvAyek4heAjSrnqZKNgefT8ljgoxRJayzwV4re3e5pvbLvuyJib4o35inAle3FkuyY6j+S2nIpsKOkF1V76vk0il5n2RCKRPb8UrXbNwPon4YzKzYuLc8D1qispHjWLd3/HMUQ6Nql2+qph1Htv4FPVh2r2pvl4wHvKy3XGip9gSIZlGN/oZ39d5XngOOqHqfeEfE3gIj4aURsD3yIYmjwlK4MdkXmhGXLRVIPSasDqwKrSlq9dEJ/J0kfTCfnB1Cc47knTbCAItF8RVJvSb2BY4GJaduNVXxvZrW0z1Moekj3AW9QfJIemm6VJLc9cH+doV8AfDfFfj/Fm+V3JfWUtAdwIHA9QEQ8QfGJ+XPA2IiYTfFp+jBSwpL0b5IOTm/I84G5FEOE5ZP0g2vE8UeKBFRpy1nAv4ChbUw9vxPYQtLnU6zrUAwx/b6TvU0i4llgHPC99Djvltpd8TiwuqRPpV7cmUB5ssIVwPmlCQTrpvM1tfyG4o3795K2qLwmVHy/qfL8TQCOkrSqpH0pPhBUvAQMUDHRpmIUcGY67kCKx26pKf8rgCuA0yRtBcVkIUlHpOUd0v9JT4oPCG+TXje2NCcsW15nUryZn0rxhv5WKoNiVtadwBzgYYo38iNL236Z4s16OkXvYFMWn/TvSzGh4rV0374Un+ZnReHFyg14OW3zUkS8U2fcf0j7/mra5kCKE/uvUAw3HhMRU0r1xwKzIuK50roozsNB8b/0HYpP+K9SvNl+Ld23EcVw1VI9j3TuqNyWN4B30/JSImJmivM4YCbF4/p66ViddRTF5IFXKWZcXls61hsUvdCrUuzzKJ6rikuB/w+MkTQH+EfaV62451NMSJgC3E0xkeUBig8hlQ8Z36R4Hl6nOD91a2n7KRQJ6uk0tDaI4msR4yg+5EyieC7OW7aHoXki4haKryxcL2k2xXO2X7q7H0VP/DWK18gs4MKuiDMH8g84mjWXpDOBlyPil10dS0cknQO8PyI+19WxmFXzF4fNmiwiVrhP/WY58pCgmZllwUOCZmaWBfewzMwsCz6H1UADBw6MwYMHd3UYZmZZGT9+/CsRsW5H9ZywGmjw4MGMGzeuq8MwM8uKpLqusOIhQTMzy4ITlpmZZcFDgg306PRZbH/KtR1XNLOsjb/wmK4OoVtyD8vMzLLghGVmZllwwjIzsyw4YZmZWRacsMzMLAtOWGZmlgUnLDMzy4ITlpmZZcEJy8zMsuCEZWZmWeg2CUvSryXNlPRwqewcSc9LmpBu+6fynpKukTRJ0qOSTuu6yM3MDLpRwgKuBvatUX5JRAxNtztS2RFAr4jYBtgeOE7S4JZEaWZmNXWbhBUR9wKv1lsd6COpB9AbeAeY3azYzMysY90mYbVjuKSJaciwfyobDcwDZgDTgIsiomayk3SspHGSxi14c06LQjYz6366e8K6HNgMGEqRnC5O5TsCC4FBwBDgJEmb1tpBRIyIiGERMazHGn1bELKZWffUrRNWRLwUEQsjYhFwJUWiAjgKuDMi3o2ImcB9wLCuitPMzLp5wpK0fmn100BlBuE0YM9Upw+wMzCltdGZmVlZt/nFYUmjgD2AgZKmA2cDe0gaSjHJYipwXKr+c2CkpMmAgJERMbHlQZuZ2Xu6TcKKiCNrFP+qjbpzKaa2m5nZCqJbDwmamVk+nLDMzCwLTlhmZpYFJywzM8uCE5aZmWXBCcvMzLLghGVmZllwwjIzsyw4YZmZWRa6zZUuWmHLDQcw7sJjujoMM7OVkntYZmaWBScsMzPLghOWmZllwQnLzMyy4IRlZmZZcMIyM7MseFp7A70zYzLTzt2mq8Mwsxo2PmtSV4dgy8k9LDMzy4ITlpmZZcEJy8zMsuCEZWZmWXDCMjOzLDhhmZlZFpywzMwsC05YZmaWBScsMzPLghOWmZllwQnLzMyy0G0SlqTVJT0g6SFJkyV9L5XvKelBSQ9LukZSj9I2e0iakOqP7brozcys2yQsYD6wZ0RsCwwF9pW0C3AN8NmI2Bp4FvgCgKS1gV8AB0XEVsARXRO2mZlBN0pYUZibVnum20LgnYh4PJXfDRyWlo8Cbo6IaWn7ma2M18zMltRtEhaApFUlTQBmUiSnB4AekoalKocDG6XlzYH+ku6RNF7SMW3s81hJ4ySNe3XewmY3wcys2+pWv4cVEQuBoWm47xZgK+CzwCWSegFjKHpdUDw22wOfAHoDf5f0j1JvrLLPEcAIgA9v0Dta0hAzs26oWyWsioh4XdKfgX0j4iLgowCS9qHoWQFMB2ZFxDxgnqR7gW2Bx2vt08zMmqvbDAlKWjf1rJDUG9gbmCJpvVTWC/h34Iq0yX8Bu0nqIWkNYCfg0dZHbmZm0L16WOsD10halSJR3xgRt0u6UNIBqezyiPgTQEQ8KulOYCKwCLgqIh7uquDNzLq7bpOwImIisF2N8lOAU9rY5kLgwiaHZmZmdeg2Q4JmZpY3JywzM8uCE5aZmWXBCcvMzLLghGVmZllwwjIzsyw4YZmZWRacsMzMLAvd5ovDrbDa+lux8VnjujoMM7OVkntYZmaWBScsMzPLghOWmZllwQnLzMyy4IRlZmZZcMIyM7MseFp7A02ZOYVdf7ZrV4dhZm2478T7ujoEWw7uYZmZWRacsMzMLAtOWGZmlgUnLDMzy4ITlpmZZcEJy8zMsuCEZWZmWagrYUnaRNJeabm3pL7NDcvMzGxJHSYsSV8FRgO/TEUbArc2MygzM7Nq9fSwvg7sCswGiIgngPWaGZSZmVm1ehLW/Ih4p7IiqQcQzQvJzMxsafUkrLGSTgd6S9obuAm4rblhNZ6kD0qaULrNlvQtSdtK+rukSZJuk9Qv1d9b0vhUPl7Snl3dBjOz7qyehHUq8DIwCTgOuCMizmhqVE0QEY9FxNCIGApsD7wJ3AJcBZwaEduk9VPSJq8AB6byLwC/6YKwzcwsqSdhHQ1cHxFHRMThEXGlpAOaHViTfQJ4KiKeBTYH7k3ldwOHAUTEvyLihVQ+maKH2avlkZqZGVBfwvoZ8BdJW5bKzm1SPK3yWWBUWp4MHJyWjwA2qlH/MODBiJhffYekYyWNkzTu3bnvNiVYMzOrL2E9A3wZGC3piFSm5oXUXJJWAw6iOBcHRdtOkDQe6Au8U1V/K+BHFMOhS4mIERExLCKG9VyzZ/MCNzPr5ur5AceIiAcl7Q6MkrQTsGqT42qm/Sh6Sy8BRMQUYB8ASZsDn6pUlLQhxXmtYyLiqS6I1czMknp6WDMAIuIV4JMUU9q3bmZQTXYki4cDkbRe+rsKcCZwRVpfG/gDxYQM/0ypmVkX6zBhRcSnSsuLIuKUiMjyGoSS+gB7AzeXio+U9DgwBXgBGJnKhwPvB84qTYX3F6bNzLpIm0OCkn4SEd+SdBs1vigcEQc1NbImiIh5wICqskuBS2vUPQ84r0WhmZlZB9o7h1X53tFFrQjEzMysPW0mrIgYn/6OrZRJ6g9sFBETWxCbmZnZe+q5Wvs9kvpJWgd4ELhS0o+bH5qZmdli9UyeWCsiZgOHAtdGxE7AXs0Ny8zMbEn1JKwektYHPgPc3uR4zMzMaqonYZ0L3AU8GRH/lLQp8ERzwzIzM1tSh1e6iIibWHwZIyLiadIFYs3MzFolyy8Am5lZ91PPtQStTlustwX3neirOJmZNUM909qX+g2oNMXdzMysZeoZErxZ0nu/m5FmDN7dvJDMzMyWVk/CuhW4UdKqkgZTzBg8rZlBmZmZVatnluCV6UcPbwUGA8dFxN+aHZiZmVlZe1dr/055FdgYmADsLGnniPDlmczMrGXa62H1rVq/uY1yMzOzpmvvau3fa2UgZmZm7enwHJakzYGTKc5fvVc/IvZsXlh5mvPYY4z92O5dHYaZLafd7x3bcSVruXq+OHwTcAVwFbCwueGYmZnVVk/CWhARlzc9EjMzs3bU8z2s2ySdIGl9SetUbk2PzMzMrKSeHtYX0t9TSmUBbNr4cMzMzGqr54vDQ1oRiJmZWXvqmSV4TK3yiLi28eGYmZnVVs+Q4A6l5dWBTwAPAk5YZmbWMvUMCZ5YXpe0NnB90yIyMzOrYVl+cXge4PNaZmbWUvWcw7qNYlYgFAnuQ8CNzQzKzMysWj3nsC4qLS8Ano2I6U2KZ7lJ+jVwADAzIrZOZecAXwVeTtVOj4g70g9TXgV8hOKxuDYifpi2mQrMobi6x4KIGNbKdpiZ2ZLqOYeV20W1rgYuY+lJIZdExEVVZUcAvSJiG0lrAI9IGhURU9P9H4+IV5oarZmZ1aXDc1iSDpX0hKQ3JM2WNEfS7FYEtywi4l7g1XqrA30k9QB6A+8AK2zbzMy6s3omXfwncFBErBUR/SKib0T0a3ZgTTBc0kRJv5bUP5WNpphEMgOYBlwUEZVkF8AYSeMlHdvWTiUdK2mcpHFvvPtuUxtgZtad1ZOwXoqIR5seSXNdDmwGDKVIThen8h0pzlENopj5eJKkyiWndouIjwD7AV+X9LFaO46IERExLCKGrdWzZzPbYGbWrbV5DkvSoWlxnKQbgFuB+ZX7I+LmmhuugCLipcqypCuB29PqUcCdEfEuMFPSfcAw4OmIeD5tO1PSLRTJ7d7WRm5mZhXt9bAOTLd+wJvAPqWyA5ofWuNIWr+0+mng4bQ8Ddgz1ekD7AxMkdRHUt9S+T6lbczMrAu02cOKiC+1MpBGkTQK2AMYKGk6cDawh6ShFOelpgLHpeo/B0ZKmgwIGBkRE9Ow4C2SoHiMfhcRd7a0IWZmtoR6vjh8DfDNiHg9rfcHLo6ILzc7uGUREUfWKP5VG3XnUkxtry5/Gti2waGZmdlyqGfSxYcryQogIl4DtmteSGZmZkurJ2GtUpoGTvq14XqukGFmZtYw9SSei4G/S7oprR8BnN+8kMzMzJZWz6WZrpU0jjSbDjg0Ih5pblhmZmZLqmtoLyUoJykzM+syy/J7WGZmZi3nhGVmZllwwjIzsyw4YZmZWRb8faoG6vvBD7L7vbn93qWZWR7cwzIzsyw4YZmZWRacsMzMLAtOWGZmlgUnLDMzy4ITlpmZZcHT2hto5vQ3uOyk27o6DDOzlhp+8YEtOY57WGZmlgUnLDMzy4ITlpmZZcEJy8zMsuCEZWZmWXDCMjOzLDhhmZlZFpywzMwsC05YZmaWBScsMzPLQpYJS9KvJc2U9HCp7BxJz0uakG77p/Kekq6RNEnSo5JOK20zNZVPkDSuxnFOkhSSBramZWZm1pZcryV4NXAZcG1V+SURcVFV2RFAr4jYRtIawCOSRkXE1HT/xyPileoDSNoI2AeY1tDIzcxsmWTZw4qIe4FX660O9JHUA+gNvAPMrmO7S4Dvpu3NzKyLZZmw2jFc0sQ0ZNg/lY0G5gEzKHpLF0VEJdkFMEbSeEnHVnYi6WDg+Yh4qKMDSjpW0jhJ4+a++UZjW2NmZu9ZmRLW5cBmwFCK5HRxKt8RWAgMAoYAJ0naNN23W0R8BNgP+Lqkj6Vhw9OBs+o5aESMiIhhETFszTXWalxrzMxsCStNwoqIlyJiYUQsAq6kSFQARwF3RsS7ETETuA8YlrZ5Pv2dCdySttmMIrE9JGkqsCHwoKT3tbI9Zma2pJUmYUlav7T6aaAyg3AasGeq0wfYGZgiqY+kvqXyfYCHI2JSRKwXEYMjYjAwHfhIRLzYoqaYmVkNWc4SlDQK2AMYKGk6cDawh6ShFOelpgLHpeo/B0ZKmgwIGBkRE9Ow4C2SoHgcfhcRd7a0IWZmVrcsE1ZEHFmj+Fdt1J1LMbW9uvxpYNs6jjW4s/GZmVnjrTRDgmZmtnJzwjIzsyw4YZmZWRacsMzMLAtOWGZmlgUnLDMzy4ITlpmZZcEJy8zMsuCEZWZmWcjyShcrqvU2XIvhFx/Y1WGYma2U3MMyM7MsOGGZmVkWnLDMzCwLTlhmZpYFJywzM8uCE5aZmWXB09obaMYzT3H+5w7v6jDMzFrmjN+Obtmx3MMyM7MsOGGZmVkWnLDMzCwLTlhmZpYFJywzM8uCE5aZmWXBCcvMzLLghGVmZllwwjIzsyw4YZmZWRacsMzMLAsrRMKSNFjSUXXWHSVpoqRvS9pC0gRJ/5K0WVW906vW/9bImM3MrLVWiIQFDAY6TFiS3gfsEBEfjohLgEOA0RGxXUQ8VVV9iYQVEbs0KlgzM2u9piUsScekntBDkn6Tyq6WdHipzty0eAHw0dRb+rak1SWNlDQp9Z4+nuqNATZI9c4GvgV8TdKfq459AdA71buufCxJe0gaK+m/JD0t6QJJR0t6IB1vs1RvXUm/l/TPdNu1WY+VmZl1rCk/LyJpK+BMYJeIeEXSOh1scipwckQckLY/CYiI2EbSFsAYSZsDBwG3R8TQVE/A3Ii4qLyziDhV0vBKvRq2BbYEXgWeBq6KiB0lfRM4kSIRXgpcEhF/lbQxcFfaprqtxwLHAqy1Ru8OmmlmZsuqWb+HtSdwU0S8AhARr3Zy+92An6Vtp0h6FtgcmN2g+P4ZETMAJD1F0XMDmARUenN7AR8qciIA/SStGRFzyzuKiBHACIANBvSPBsVnZmZVWv0DjgtIw5CSVgFWa/HxK+aXlheV1hex+DFZBdg5It5uZWBmZlZbs85h/Qk4QtIAgNKQ4FRg+7R8ENAzLc8B+pa2/wtwdNp2c2Bj4LFOxvCupJ4dV2vTGIrhQVIcbQ0vmplZCzQlYUXEZOB8YKykh4Afp7uuBHZPZf8HmJfKJwIL0wSNbwO/AFaRNAm4AfhiRMync0YAEyuTLpbBN4BhaeLII8Dxy7gfMzNrAEX4tEujbDCgf5yw3ye6Ogwzs5Y547ejl3sfksZHxLCO6q0o38MyMzNrlxOWmZllwQnLzMyy4IRlZmZZcMIyM7MsOGGZmVkWnLDMzCwLTlhmZpaFVl9LcKW2/pDNGvIlOjMzW5p7WGZmlgUnLDMzy4ITlpmZZcEXv20gSXPo/M+g5G4g8EpXB9FibnP30R3b3RVt3iQi1u2okiddNNZj9VxxeGUiaZzbvPLrjm2G7tnuFbnNHhI0M7MsOGGZmVkWnLAaa0RXB9AF3ObuoTu2Gbpnu1fYNnvShZmZZcE9LDMzy4ITlpmZZcEJK5G0r6THJD0p6dQa9/eSdEO6/35Jg0v3nZbKH5P0yTb2PyRt92Taz2od7bfZWtDmqyU9I2lCug1N5ZL007T9REkfaVYba8S0TG2WNEDSnyXNlXRZHcc5SVJIGpjWc2zz3pLGS5qU/u7Zxv6/n9o0QdIYSYNSeY5t3rH0en1I0qfb2P91af8PS/q1pJ6pPLs2l+7fOL2+T25j/8PTtu+9rlN569ocEd3+BqwKPAVsCqwGPAR8qKrOCcAVafmzwA1p+UOpfi9gSNrPqjWOcSPw2bR8BfC19va7krT5auDwGuX7A38EBOwM3J9Bm/sAuwHHA5d1cJyNgLuAZ4GBGbd5O2BQWt4aeL6NY/QrLX+jtK8c27wG0CMtrw/MrKzXeA0r3UaV/p+za3Pp/tHATcDJbRxjO2AwMLXyum51m93DKuwIPBkRT0fEO8D1wMFVdQ4GrknLo4FPSFIqvz4i5kfEM8CTaX/vSfX2TNuR9nNIB/tttqa2uQMHA9dG4R/A2pLWX57G1GmZ2xwR8yLir8DbdRznEuC7QHlGU45t/ldEvJDKJwO9JfWqPkBEzC6t9mFxu3Ns85sRsSCVr86Sz+F7IuKO1K4AHgA2LO03qzYDSDoEeIbiea4pvR6m1rirZW12wipsADxXWp8ObCDpXEkHVddJL+g3gAFtbQsg6Y40PDIAeL30j/BenXb222zNbnPF+WmY4JLSm12b2zfZ8rS5TZKukjQsLR9M0RN5qJ5jL2tDOqFRbT4MeDAi5sOSbU7r50t6DjgaOKu9YzekVe1brjZL2knSZGAScHzl/7bGa5s0FPh54M72jt3AtrVlmdssaU3g34HvVe+0VpvrPfYytaIDvjRTOyLirI5rtbv9/gDl8d4VXaPanJwGvEgxRDGC4p/i3OXZfzM0oM1fAZC0BnA6sE8j4mqmzrRZ0lbAjyi1q9Lm0voZwBmSTgOGA2c3KNSGqbfNEXE/sJWkLYFrJP0xIt6uem1X/AK4NyL+0shYG6XONp8DXBIRc6sHd9poc5dxD6vwPMV5h4oNU1nNOpJ6AGsBs+rcdhZFN7lHjTpt7bfZmt1mImJGGiaYD4xk8bBhXds3wfK0uR6bUZzTe0jS1LT/ByW9r85jN8NytVnShsAtwDER8VQdx7uOojdW77GboSHPc0Q8CsylOH+3FElnA+sC3+nksZthedq8E/Cf6TX7LeB0ScMbfOzGaNbJsZxuFD3NpynebConLLeqqvN1ljxheWNa3oolJyA8Te0JCDex5KSLE9rb70rS5vXTXwE/AS5I659iyZO0D6zobS7d/0U6mHRRqjuVxZMusmszsHaqf2gHx/hAaflEYHTGbR7C4kkXmwAvUJpgUNr+K8DfgN5V5dm1uarOObQx6aLW67rVbW76A5nLjWKmy+MUM23OSGXnAgel5dUpks6TFCdZNy1te0ba7jFgv1L5HSyeZbVp2u7JtJ9eHe13JWjznyjOAzwM/BZYM5UL+HnafhIwLJM2TwVepfjUPZ00Cwu4qlYbWDJhZddm4ExgHjChdFuvus3A79NzPBG4Ddgg4zZ/nmLiwQTgQeCQNl7bC9K+K4/LWbm2uWof51BKWFVt/kZ63S+gSORXtbrNvjSTmZllweewzMwsC05YZmaWBScsMzPLghOWmZllwQnLzMyy4IRllilJa0s6IS0PkjS6o23McuZp7WaZSj8PcXtE1LwSg9nKxtcSNMvXBcBmkiYATwBbRsTWkr5I8WsAfYAPABdRXP3g88B8YP+IeFXSZhRf+FwXeBP4akRMaX0zzOrjIUGzfJ0KPBURQ4FTqu7Vt65mAAAA00lEQVTbGjgU2AE4H3gzIrYD/g4ck+qMAE6MiO2Bkyku5Gq2wnIPy2zl9OeImAPMkfQGxSWToLh0zofTT0rsAtxUukL3Ur91ZbYiccIyWznNLy0vKq0vovi/X4XiN9qGtjows2XlIUGzfM0B+i7LhlH8SvAzko6A4lexJW3byODMGs0JyyxTETELuE/Sw8CFy7CLo4H/J+khiiuUV/+kutkKxdPazcwsC+5hmZlZFpywzMwsC05YZmaWBScsMzPLghOWmZllwQnLzMyy4IRlZmZZ+F8JzQcdCXjExgAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "results_df_4 = pd.DataFrame({'time': [one_percent_per_chunk_duration,\n",
    "                                      two_point_five_per_chunk_duration,\n",
    "                                      five_percent_per_chunk_duration,\n",
    "                                      default_duration_4,\n",
    "                                      one_chunk_duration_4,\n",
    "                                      cutoff_duration_4],\n",
    "                           'chunk size': ['158', '396', '792', '1585', '15854', 'cutoff time']})\n",
    "ax = seaborn.barplot(x='time', y='chunk size', data=results_df_4)\n",
    "ax.set_title(\"15854 Rows, 4 Unique Cutoff Times\")\n",
    "ax.xaxis.set_major_formatter(formatter)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Summary \n",
    "\n",
    "In review, featuretools uses a parameter `chunk_size` to divide up the instances when calculating features.  Creating chunks that have the right amount of data can speed up calculations. If you need to speed up calculations, experimenting with different chunk sizes on a subset of the data before calculating on all the data can help find the best size.   "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
